Search CORE

167 research outputs found

Reconciling Graphs and Sets of Sets

Author: Aho Alfred V.
Broder Andrei
Brodnik Andrej
Publication venue
Publication date: 25/07/2018
Field of study

We explore a generalization of set reconciliation, where the goal is to reconcile sets of sets. Alice and Bob each have a parent set consisting of

s

child sets, each containing at most

h

elements from a universe of size

u

. They want to reconcile their sets of sets in a scenario where the total number of differences between all of their child sets (under the minimum difference matching between their child sets) is

d

. We give several algorithms for this problem, and discuss applications to reconciliation problems on graphs, databases, and collections of documents. We specifically focus on graph reconciliation, providing protocols based on set of sets reconciliation for random graphs from

G(n,p)

and for forests of rooted trees

arXiv.org e-Print Archive

Crossref

Optimal plans for aggregation

Author: Andrei Broder
Michael Mitzenmacher
Publication venue
Publication date: 01/01/2002
Field of study

We consider the following problem, which arises in the context of distributed Web computations. An aggregator aims to combine specific data from n sources. The aggregator contacts all sources at once. The time for each source to return its data to the aggregator is independent and identically distributed according to a known distribution. The aggregator at some point stops waiting for data and returns an answer depending only on the data received so far. If the aggregator returns the aggregated information from k of the n sources at time t it obtains a reward Rk(t) thatgrowswithk and decreases with t. The goal of the aggregator is to maximize its expected reward. We prove that for certain broad families of distributions and broad classes of reward functions, the optimal plan for the aggregator has a simple form and hence can be easily computed

CiteSeerX

Crossref

Delphic Costs and Benefits in Web Search: A utilitarian and historical analysis

Author: Broder Andrei Z.
McAfee Preston
Publication venue
Publication date: 14/08/2023
Field of study

We present a new framework to conceptualize and operationalize the total user experience of search, by studying the entirety of a search journey from an utilitarian point of view. Web search engines are widely perceived as "free". But search requires time and effort: in reality there are many intermingled non-monetary costs (e.g. time costs, cognitive costs, interactivity costs) and the benefits may be marred by various impairments, such as misunderstanding and misinformation. This characterization of costs and benefits appears to be inherent to the human search for information within the pursuit of some larger task: most of the costs and impairments can be identified in interactions with any web search engine, interactions with public libraries, and even in interactions with ancient oracles. To emphasize this innate connection, we call these costs and benefits Delphic, in contrast to explicitly financial costs and benefits. Our main thesis is that the users' satisfaction with a search engine mostly depends on their experience of Delphic cost and benefits, in other words on their utility. The consumer utility is correlated with classic measures of search engine quality, such as ranking, precision, recall, etc., but is not completely determined by them. To argue our thesis, we catalog the Delphic costs and benefits and show how the development of search engines over the last quarter century, from classic Information Retrieval roots to the integration of Large Language Models, was driven to a great extent by the quest of decreasing Delphic costs and increasing Delphic benefits. We hope that the Delphic costs framework will engender new ideas and new research for evaluating and improving the web experience for everyone.Comment: 10 page

arXiv.org e-Print Archive

Operator Drowsiness Test

Author: Broder Andrei
Dua Robin
Navalpakkam Vidhya
Publication venue: Technical Disclosure Commons
Publication date: 08/01/2019
Field of study

This publication details a quantifiable and objective operator drowsiness test. The test takes between 30 seconds to two (2) minutes to be administered. Any smartphone that has a front-facing camera and the supporting software can run the newly-developed and self-administrable test. It leverages years in sleep deprivation research that have found objective correlations between drowsiness (or alertness) and physical and behavioral parameters, such as: gazing, facial features, pupil size, blink rate, blink duration, breathing, pulse, head movements, face skin-tone, speech pattern, and vocal sound. In addition, the mass use of smartphones with rear-facing and front-facing cameras gives researchers the opportunity to deploy this new operator drowsiness test to a wide audience

Technical Disclosure Common

On-line load balancing

Author: Azar Yossi
Broder Andrei Z.
Karlin Anna R.
Publication venue: Published by Elsevier B.V.
Publication date: 01/08/1994
Field of study

AbstractThe setup for our problem consists of n servers that must complete a set of tasks. Each task can be handled only by a subset of the servers, requires a different level of service, and once assigned cannot be reassigned. We make the natural assumption that the level of service is known at arrival time, but that the duration of service is not. The on-line load balancing problem is to assign each task to an appropriate server in such a way that the maximum load on the servers is minimized. In this paper we derive matching upper and lower bounds for the competitive ratio of the on-line greedy algorithm for this problem, namely, [(3n)23/2](1+o(1)), and derive a lower bound, Ω(n12), for any other deterministic or randomized on-line algorithm

Elsevier - Publisher Connector

Torts

Author: Andrei Z. Broder
Farzin Maghoul
Jan Pedersen
Ronny Lempel
Publication venue: UNM Digital Repository
Publication date: 01/01/1982
Field of study

Crossref

Data-driven evaluation metrics for heterogeneous search engine result pages

Author: Broder Andrei
Fuhr Norbert
Jaana Kalervo
Lorigo Lori
Mark
Voorhees Ellen
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 14/03/2020
Field of study

Evaluation metrics for search typically assume items are homoge- neous. However, in the context of web search, this assumption does not hold. Modern search engine result pages (SERPs) are composed of a variety of item types (e.g., news, web, entity, etc.), and their influence on browsing behavior is largely unknown. In this paper, we perform a large-scale empirical analysis of pop- ular web search queries and investigate how different item types influence how people interact on SERPs. We then infer a user brows- ing model given people’s interactions with SERP items – creating a data-driven metric based on item type. We show that the proposed metric leads to more accurate estimates of: (1) total gain, (2) total time spent, and (3) stopping depth – without requiring extensive parameter tuning or a priori relevance information. These results suggest that item heterogeneity should be accounted for when de- veloping metrics for SERPs. While many open questions remain concerning the applicability and generalizability of data-driven metrics, they do serve as a formal mechanism to link observed user behaviors directly to how performance is measured. From this approach, we can draw new insights regarding the relationship be- tween behavior and performance – and design data-driven metrics based on real user behavior rather than using metrics reliant on some hypothesized model of user browsing behavior

Crossref

University of Strathclyde Institutional Repository

Nobody cares if you liked Star Wars: KNN graph construction on the cheap

Author: Andrei Z. Broder
F. Maxwell Harper
G Linden
P Li
Y Bachrach
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 27/08/2018
Field of study

International audienceK-Nearest-Neighbors (KNN) graphs play a key role in a large range of applications. A KNN graph typically connects entities characterized by a set of features so that each entity becomes linked to its k most similar counterparts according to some similarity function. As datasets grow, KNN graphs are unfortunately becoming increasingly costly to construct, and the general approach, which consists in reducing the number of comparisons between entities, seems to have reached its full potential. In this paper we propose to overcome this limit with a simple yet powerful strategy that samples the set of features of each entity and only keeps the least popular features. We show that this strategy outperforms other more straightforward policies on a range of four representative datasets: for instance, keeping the 25 least popular items reduces computational time by up to 63%, while producing a KNN graph close to the ideal one

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

HAL-Rennes 1

Fair Near Neighbor Search: Independent Range Sampling in High Dimensions. PODS

Author: Afshani Peyman
Afshani Peyman
Aumüller Martin
Broder Andrei Z.
Dwork Cynthia
Har-Peled Sariel
Hardt Moritz
Ilya
Leonhardt Jurek
Riazi M. Sadegh
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2020
Field of study

Similarity search is a fundamental algorithmic primitive, widely used in many computer science disciplines. There are several variants of the similarity search problem, and one of the most relevant is the

r

-near neighbor (

r

-NN) problem: given a radius

r>0

and a set of points

S

, construct a data structure that, for any given query point

q

, returns a point

p

within distance at most

r

from

q

. In this paper, we study the

r

-NN problem in the light of fairness. We consider fairness in the sense of equal opportunity: all points that are within distance

r

from the query should have the same probability to be returned. In the low-dimensional case, this problem was first studied by Hu, Qiao, and Tao (PODS 2014). Locality sensitive hashing (LSH), the theoretically strongest approach to similarity search in high dimensions, does not provide such a fairness guarantee. To address this, we propose efficient data structures for

r

-NN where all points in

S

that are near

q

have the same probability to be selected and returned by the query. Specifically, we first propose a black-box approach that, given any LSH scheme, constructs a data structure for uniformly sampling points in the neighborhood of a query. Then, we develop a data structure for fair similarity search under inner product that requires nearly-linear space and exploits locality sensitive filters. The paper concludes with an experimental evaluation that highlights (un)fairness in a recommendation setting on real-world datasets and discusses the inherent unfairness introduced by solving other variants of the problem.Comment: Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS), Pages 191-204, June 202

arXiv.org e-Print Archive

Crossref

The IT University of Copenhagen's Repository

Archivio istituzionale della ricerca - Università di Padova